Overview

Background on IPMs, outline of salmonIPM

Model Description

Process Model

Egg Deposition and Egg-to-Smolt Survival

Three spawner-recruit functional forms

\[ f( S_{jt} | \alpha_{jt}, M_{\text{max},j} ) = \begin{cases} \alpha_{jt} S_{jt} & \text{exponential} \\ \dfrac{ \alpha_{jt} S_{jt} }{ 1 + \alpha_{jt} S_{jt} / M_{\text{max},j} } & \text{Beverton-Holt} \\ \alpha_{jt} S_{jt} \text{exp}\left(- \dfrac{ \alpha_{jt}S_{jt} }{ \text{exp}(1) M_{\text{max},j} } \right) & \text{Ricker} \end{cases} \]

Intrinsic productivity is calculated as a weighted mean of age-specific female fecundity, weighted by the spawner age distribution, multiplied by the proportion female spawners and the population-specific density-independent egg-to-smolt survival

\[ \alpha_{jt} = \psi_{j} q_{F,jt} \sum_{a=3}^{5} q_{jta} \mu_{\text{fec},a} \]

Maxmum smolt production (“capacity”) varies randomly among populations according to the hyperdistribution

\[ \text{log}(M_{\text{max},j}) \sim N(\mu_{M_{\text{max}}}, \sigma_{M_{\text{max}}}) \]

Smolt Recruitment Process Error

Lognormal errors with a common ESU-level autoregressive trend plus unique independent shocks

\[ \begin{aligned} M_{jt} &= f( S_{jt} | \alpha_{jt}, M_{\text{max},j} ) \, \text{exp}( \eta^\text{year}_{M,t} + \epsilon_{M,jt} ) \\ \eta^\text{year}_{M,t} &\sim N(\rho_{M} \eta^\text{year}_{M,t-1}, \sigma^\text{year}_{M}) \\ \epsilon_{M,jt} &\sim N(0, \sigma_{M}) \end{aligned} \]

Smolt-to-Adult Survival

The SAR process is modeled as logistic normal with an ESU-level autoregressive trend plus unique independent shocks

\[ \begin{aligned} \text{logit}( s_{MS,jt} ) &= \text{logit}( \mu_{MS} ) + \eta^\text{year}_{MS,t} + \epsilon_{MS,jt} \\ \eta^\text{year}_{MS,t} &\sim N(\rho_{MS} \eta^\text{year}_{MS,t-1}, \sigma^\text{year}_{MS}) \\ \epsilon_{MS,jt} &\sim N(0, \sigma_{MS}) \end{aligned} \]

Conditional Age-at-Return

Adult age structure is modeled by defining a vector of conditional probabilities, \(\mathbf{p}_{jt} = [p_{3jt}, p_{4jt}, p_{5jt}] ^ \top\), where \(p_{ajt}\) is the probability of an outmigrant in year \(t\) in population \(j\) returning at age \(a\), given that it survives to adulthood. The unconditional probability is given by \(s_{MS,jt} p_{ajt}\), where both SAR and \(p_a\) are functions of underlying annual marine survival and maturation probabilities that are nonidentifiable without some ancillary data. This parameterization resolves the nonidentifiability.

The conditional age probabilities follow a logistic normal process model with hierarchical structure across populations and through time within each population. The additive log ratio,

\[ \text{alr}(\mathbf{p_{jt}}) = \left[ \text{log} \left( \dfrac{p_{3jt}}{p_{5jt}} \right), \text{log} \left( \dfrac{p_{4jt}}{p_{5jt}} \right) \right] ^ \top \]

has a bivariate normal distribution:

\[ \begin{aligned} \text{alr}(\mathbf{p_{jt}}) &= \text{alr}(\boldsymbol{\mu}_\mathbf{p}) + \boldsymbol{\eta}^\text{pop}_{\mathbf{p}, j} + \boldsymbol{\epsilon}_{\mathbf{p}, jt} \\ \boldsymbol{\eta}^\text{pop}_{\mathbf{p}, j} &\sim N(\mathbf{0}, \boldsymbol{\Sigma}^\text{pop}_\mathbf{p}) \\ \boldsymbol{\epsilon}_{\mathbf{p}, jt} &\sim N(\mathbf{0}, \boldsymbol{\Sigma}_\mathbf{p}). \end{aligned} \]

Here the 2 \(\times\) 2 covariances matrices \(\boldsymbol{\Sigma}^\text{pop}_\mathbf{p}\) and \(\boldsymbol{\Sigma}_\mathbf{p}\) allow correlated variation among age classes (on the unconstrained scale, not merely due to the mathematical simplex constraint on \(\mathbf{p}\)) across populations and through time within a population, respectively. For example, some populations or cohorts may skew overall younger or older than average. We parameterize each covariance matrix by a vector of standard deviations and a correlation matrix:

\[ \begin{aligned} \boldsymbol{\Sigma}^\text{pop}_\mathbf{p} &= \boldsymbol{\sigma}^\text{pop}_\mathbf{p} \mathbf{R}_\mathbf{p}^\text{pop} { \boldsymbol{\sigma}^\text{pop}_\mathbf{p} } ^ \top \\ \boldsymbol{\Sigma}_\mathbf{p} &= \boldsymbol{\sigma}_\mathbf{p} \mathbf{R}_\mathbf{p} \boldsymbol{\sigma}_\mathbf{p} ^ \top \end{aligned} \]

Adult Recruitment

Survival to adults at age, broodstock removal assumed known, harvest assumed to be zero for now

\[ S_{\text{W}, jt} = \left(\sum_{a=3}^{5} s_{MS,j,t-a} \hspace{0.1cm} p_{aj,t-a} \hspace{0.1cm} M_{j,t-a} \right) - B_{jt} = \left(\sum_{a=3}^{5} \tilde{S}_{\text{W}, ajt} \right) - B_{jt} \]

Spawner age structure is \(\mathbf{q}_{jt} = [q_{3jt}, q_{4jt}, q_{5jt}]\), where \(q_{ajt} = \tilde{S}_{\text{W},ajt} / S_{jt}\).

Wild vs. hatchery spawners

\[ S_{\text{H},jt} = S_{\text{W},jt} p_{\text{HOS},jt} / (1 - p_{\text{HOS},jt}) \]

Total spawner abundance is then \(S_{jt} = S_{\text{W},jt} + S_{\text{H},jt}\).

Observation Model

Fecundity

We modeled observations of fecundity from individual female chum salmon collected at hatcheries. The likelihood for the fecundity of female \(i\) of age \(a\) is a zero-truncated normal with age-specific mean and SD.

\[ E_{a,i}^\text{obs} \sim N(\mu_{E,a}, \sigma_{E,a}) \hspace{0.1cm} T[0, \infty) \]

Smolt and Spawner Abundance

Informative priors based on Bayesian observation models applied to field data of various kinds

\[ \begin{aligned} \text{log}(M_{jt}) &\sim N(\mu_{M,jt}, \tau_{M,ij}) \\ \text{log}(S_{jt}) &\sim N(\mu_{S,jt}, \tau_{S,ij}) \end{aligned} \]

Some prior observation error SDs are missing or unknown, and so were imputed by fitting a lognormal hyperdistribution to the known SDs

\[ \begin{aligned} \text{log}(\tau_{M,ij}) &\sim N( \mu_{\tau_M}, \sigma_{\tau_M}) \\ \text{log}(\tau_{S,ij}) &\sim N( \mu_{\tau_S}, \sigma_{\tau_S}) \end{aligned} \]

Spawner Age and Origin Composition

Age composition of wild spawners \(\mathbf{n}_{ajt}^\text{obs} = [n_{3jt}^\text{obs}, n_{4jt}^\text{obs}, n_{5jt}^\text{obs}] ^\top\) is assumed to follow a multinomial likelihood with the expected proportions given by the unobserved true state

\[ \mathbf{n}_{ajt}^\text{obs} \sim \text{Multinomial} \left( \sum_a n_{ajt}^\text{obs}, \mathbf{q}_{jt} \right) \]

Hatchery/wild composition of spawners

\[ n_{\text{H},jt}^\text{obs} \sim \text{Bin} \left( n_{\text{W},jt}^\text{obs} + n_{\text{H},jt}^\text{obs}, p_{\text{HOS},jt} \right) \]

Priors

Setup and Data

Load the packages we’ll need…

options(device = ifelse(.Platform$OS.type == "windows", "windows", "quartz"))
options(mc.cores = parallel::detectCores(logical = FALSE) - 1)

library(salmonIPM)
library(rstan)
library(shinystan)
library(matrixStats)
library(Hmisc)
library(dplyr)
library(tidyr)
library(yarrr)
library(magicaxis)
library(viridis)
library(zoo)
library(ggplot2)
theme_set(theme_bw(base_size = 16))
library(here)

# load data
source(here("analysis","R","01_LCRchumIPM_data.R"))
# load plotting functions
source(here("analysis","R","03_LCRchumIPM_plots.R"))
# load saved stanfit objects
if(file.exists(here("analysis","results","LCRchumIPM.RData")))
  load(here("analysis","results","LCRchumIPM.RData"))

Read in and manipulate the data…

Let’s look at the first few rows of fish_data to see the format salmonIPM expects…

head(fish_data_SMS)

Retrospective Models

Fit two-stage spawner-smolt-spawner models and explore output…

We fit exponential, Beverton-Holt and Ricker models, but model comparison using LOO is not feasible, so here we focus on the Ricker.

LCRchum_Ricker <- salmonIPM(fish_data = fish_data_SMS, fecundity_data = fecundity_data,
                            ages = list(M = 1), stan_model = "IPM_LCRchum_pp", SR_fun = "Ricker",
                            log_lik = TRUE, chains = 3, iter = 1500, warmup = 500,
                            control = list(adapt_delta = 0.99, max_treedepth = 14))
print(LCRchum_Ricker, prob = c(0.05,0.5,0.95),
      pars = c("psi","Mmax","eta_year_M","eta_year_MS","eta_pop_p","mu_pop_alr_p","p","p_F",
               "tau_M","tau_S","p_HOS","B_rate","E_hat","M","S","s_MS","q","q_F","LL"), 
      include = FALSE, use_cache = FALSE)
Inference for Stan model: IPM_LCRchum_pp.
3 chains, each with iter=1500; warmup=500; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=3000.

                    mean se_mean    sd        5%       50%       95% n_eff Rhat
mu_E[1]          2591.57    0.64 44.93   2518.44   2591.13   2667.22  4936 1.00
mu_E[2]          2856.96    0.30 24.60   2817.14   2856.61   2897.67  6685 1.00
mu_E[3]          2868.99    1.11 73.02   2747.91   2869.97   2985.28  4357 1.00
sigma_E[1]        510.29    0.48 33.22    457.73    509.36    566.74  4833 1.00
sigma_E[2]        560.74    0.28 17.22    533.23    560.26    590.04  3829 1.00
sigma_E[3]        435.38    0.93 55.47    355.67    429.31    534.84  3584 1.00
delta_NG            0.57    0.01  0.24      0.17      0.58      0.95  1626 1.00
mu_psi              0.59    0.00  0.08      0.46      0.58      0.73   668 1.01
sigma_psi           0.40    0.01  0.26      0.05      0.37      0.87   756 1.00
mu_Mmax             7.28    0.02  0.58      6.44      7.24      8.27   675 1.00
sigma_Mmax          1.31    0.01  0.47      0.75      1.21      2.17  1016 1.00
rho_M               0.10    0.02  0.43     -0.64      0.13      0.75   519 1.00
sigma_year_M        0.46    0.00  0.12      0.30      0.45      0.68  1130 1.00
sigma_M             0.30    0.00  0.05      0.22      0.29      0.37   707 1.00
mu_MS               0.00    0.00  0.00      0.00      0.00      0.00  1515 1.00
rho_MS              0.49    0.01  0.22      0.10      0.52      0.80   717 1.00
sigma_year_MS       1.03    0.01  0.22      0.73      1.00      1.43  1017 1.00
sigma_MS            0.56    0.00  0.05      0.47      0.56      0.66   627 1.01
mu_p[1]             0.23    0.00  0.02      0.20      0.23      0.27   461 1.01
mu_p[2]             0.72    0.00  0.02      0.69      0.72      0.75   549 1.01
mu_p[3]             0.04    0.00  0.01      0.03      0.04      0.05   444 1.00
sigma_pop_p[1]      0.20    0.01  0.17      0.02      0.16      0.53   290 1.02
sigma_pop_p[2]      0.14    0.01  0.12      0.01      0.11      0.37   332 1.01
R_pop_p[1,1]        1.00     NaN  0.00      1.00      1.00      1.00   NaN  NaN
R_pop_p[1,2]        0.35    0.03  0.58     -0.78      0.53      0.98   529 1.00
R_pop_p[2,1]        0.35    0.03  0.58     -0.78      0.53      0.98   529 1.00
R_pop_p[2,2]        1.00    0.00  0.00      1.00      1.00      1.00  2848 1.00
sigma_p[1]          1.70    0.01  0.14      1.48      1.69      1.94   509 1.01
sigma_p[2]          0.87    0.00  0.09      0.72      0.86      1.03   565 1.01
R_p[1,1]            1.00     NaN  0.00      1.00      1.00      1.00   NaN  NaN
R_p[1,2]            0.75    0.00  0.06      0.64      0.76      0.85   695 1.01
R_p[2,1]            0.75    0.00  0.06      0.64      0.76      0.85   695 1.01
R_p[2,2]            1.00    0.00  0.00      1.00      1.00      1.00  3018 1.00
mu_F                0.50    0.00  0.02      0.47      0.50      0.52  1000 1.00
sigma_pop_F         0.19    0.00  0.07      0.09      0.18      0.31   770 1.00
sigma_F             0.38    0.00  0.04      0.32      0.37      0.44  1112 1.00
mu_tau_M            0.08    0.00  0.01      0.06      0.08      0.10  3418 1.00
sigma_tau_M         1.13    0.00  0.12      0.96      1.13      1.34  3193 1.00
mu_tau_S            0.11    0.00  0.01      0.10      0.11      0.12  2642 1.00
sigma_tau_S         0.98    0.00  0.06      0.89      0.98      1.08  2966 1.00
lp__           -41678.71    1.36 38.84 -41745.68 -41677.70 -41614.36   812 1.01

Samples were drawn using NUTS(diag_e) at Sun May 09 06:10:01 2021.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Plot estimated spawner-smolt production curves and parameters for the Beverton-Holt model.

Figure 1: Estimated Ricker spawner-recruit relationship (A, B) and intrinsic productivity (C) and capacity (D) parameters for the multi-population IPM. Thin lines correspond to each of 12 populations of Lower Columbia chum salmon; thick lines represent hyper-means across populations. In (A, B), each curve is a posterior median and the shaded region represents the 90% credible interval of the hyper-mean curve (uncertainty around the population-specific curves is omitted for clarity).

Here are the fits to the spawner data:

Figure 2: Observed (points) and estimated spawner abundance for Lower Columbia River chum salmon populations. Filled points indicate known observation error SD, while SD for open points is imputed. The posterior median (solid gray line) is from the multi-population IPM. Posterior 90% credible intervals indicate process (dark shading) and observation (light shading) uncertainty.

And here are the fits to the much sparser smolt data:

Figure 3: Observed (points) and estimated smolt abundance for Lower Columbia River chum salmon populations. Filled points indicate known observation error SD, while SD for open points is imputed. The posterior median (solid gray line) is from the multi-population IPM. Posterior 90% credible intervals indicate process (dark shading) and observation (light shading) uncertainty.

To understand how the IPM is imputing the observation error SD in cases where it is not reported, let’s look at the lognormal hyperdistribution fitted to the known SD values…

Figure 4: Lognormal hyperdistributions used to impute unknown smolt and spawner observation error SDs in the IPM. The posterior median (line) and 90% credible interval (shading) of the distribution fitted to the known SD values (histogram) are shown for each life stage.

We can also compare the estimated spawner age-frequencies to the sample proportions from the BioData. Age composition varies quite a bit across populations and through time, reflecting fluctuations in cohort strength.

Figure 5: Observed (points) and estimated spawner age composition for Lower Columbia River chum salmon populations. The posterior distribution from the multi-population IPM is summarized by the median (solid line) and 90% credible interval (shading). The error bar around each observed proportion indicates the 90% binomial confidence interval based on sample size.

[add sex ratio and pHOS]

Forecasting

It is straightforward to use the IPM to generate forecasts of population dynamics…

Figure 6: Observed (points) and estimated spawner abundance for Lower Columbia River chum salmon populations, including 5-year forecasts. Filled points indicate known observation error SD, while SD for open points is imputed. The posterior median (solid gray line) is from the multi-population IPM. Posterior 90% credible intervals indicate process (dark shading) and observation (light shading) uncertainty.

Of course we could also look at forecasts of smolts, or any other state variable. Here are the 2020 forecasts of wild spawners for each population…